Mapping of SARS-CoV-2 spike protein evolution during first and second waves of COVID-19 infections in India

Aim: The aim of this study was to investigate the SARS-CoV-2 spike protein evolution during the first and second wave of COVID-19 infections in India. Materials & Methods: Detailed mutation analysis was done in 763 samples taken from GISAID for the ten most affected Indian states between March 2020 to August 2021. Results: The study revealed 242 mutations corresponding to 207 sites. Fifty one novel mutations emerged during the assessment period, including many with higher transmissibility and immune evasion functions. Highest number of mutations per spike protein also rose from 5 (first wave) to 13 (second wave). Conclusion: The study identified mutation-rich and no mutation regions in the spike protein. The conserved spike regions can be useful for designing future diagnostics, vaccines and therapeutics.


Introduction
After the first reports of three cases of COVID-19 from Kerala, India on 30 January 2020, a significant number of positive COVID-19 cases from March 2020 onwards began registering. The first wave of COVID-19 infections peaked in September 2020 in India, with about 100,000 cases reported per day. From February 2021 onwards, India battled a brutal second wave of COVID-19 infections with severe consequences. The daily number of cases rose sharply and crossed the 400,000 mark and 4000 deaths every day in early May 2021 [1]. As compared to the first wave (highest 100,000 cases in September 2020), a sharp increase in positivity rate with a massive surge from 1.62% on 1 March 2021 to approximately 20% on 13 May 2021 was observed [1]. It created an emergency situation in the country with reduced supplies and increased deaths, especially in the young population [2,3].
Multiple factors might have potentially contributed to this sudden spike in cases, such as introduction of highly infectious SARS-CoV-2 lineages, complex interplay between mutant strains and violation of COVID-19 appropriate behavior. The roll out of vaccination programs from 16 January 2021 and the resultant natural selection pressure on the viral genomes might have induced novel mutations and contributed to the further evolution of circulating lineages. The emergence of novel SARS-CoV-2 lineage B.1.617.2 (Delta) in December 2020 also fueled the surge in daily infections and drove the second COVID-19 wave in India [4]. Signal peptide S1 A domain S1 A -S1 B linker S1 B domain (RBD) S1 B -S1 C linker S1 C domain S1 C -S1 D linker S1 D  Since the emergence of COVID-19 disease in China in 2019, several variant lineages of SARS-CoV-2 including the current B.1.1.529 Omicron variant have emerged across the world. The variants have been classified as variants of concern (VOCs), variants of interest (VOIs), variants being monitored (VBMs) and variants of high concern (VOHCs) based on their differences in transmissibility, virulence or efficacy of diagnostics, vaccines and therapeutics [5]. Currently, the two listed variants of concern (VOCs) include Delta (B. There are no VOIs and VOHCs listed currently [5]. These novel SARS-CoV-2 lineages are better sustained than the Wuhan strain (clade O) due to fitness provided by the accumulated novel mutations that confer immune surveillance escape, high virulence, pathogenicity and better transmissibility resulting in deadlier resurgent outbreaks of infection globally. Many of these incurred mutations span the spike glycoprotein of SARS-CoV-2, wherein more than 5,000 mutations have already been reported [6,7].
The surface spike (S) protein present as small spikes on the SARS-CoV-2 surface is responsible for host cell receptor recognition and viral entry. It is encoded by the S gene (3821nt, position 21563-25384) and consists of 1273 amino acids [8]. At the N-terminus, S protein has a signal peptide (amino acids 1-13), which is followed by S1 and S2 subunits ( Figure 1). The spike protein remains inactive in its native state. It is activated with help of proteases of the host cell membrane, specifically, a trans-membrane protease serine 2 (TMPRSS2) [10], that upon receptor recognition cleaves it into two subunits S1 and S2 at a furin-cleavage site [9,10]. S1 has the receptor binding domain (RBD) which is responsible for host cell angiotensin-converting enzyme 2 (ACE2) receptor recognition, interaction and binding. S2 domain brings about viral and host cell membrane fusion and enables viral entry into the host. S1 and S2 subunits are further divided into various domains and sub-domains (Figure 1).
In a rapidly evolving genome like SARS-CoV-2, the characterization of incurred mutations provides significant information for assessing the mechanisms linked to pathogenesis, immune evasion and viral drug resistance and also unravels useful insights into evolutionary patterns and spread of the virus. Furthermore, since the S protein has been an important target for designing anti-covid drugs, diagnostics and vaccines [11], the information about the conserved spike domains and sub-domains, if any, can be especially useful for designing the diagnostics, therapeutics and/or vaccines. In the present study we have undertaken a detailed analysis of Indian SARS-CoV-2 spike protein mutations in 763 randomly selected samples from ten Indian states deposited at Global Initiative on Sharing All Influenza Data (GISAID) [12] data base between March 2020 to August 2021, and compared it to prototype Wuhan sample (GenBank no. MN908947). The assessment period of about one and a half years includes both first and V3G   L5F   V6A   L7S   S12F  L18F   T20I  Q23R   P26L   A27V   Y28H  G35V   S46L   V47l   H49Y   Q52R  A67V  H69del  V70del   L54F   F59S   W64C  I68del  S71T  G75V  K77M   R78M   D80A   V83F   L84l  F86I   P85T  N87K   S94F   T95l   S98F   K97N  L117I  E132D   D138Y   L141F   G142D   Y145H   W152L   M153K  S155I   E156G   F157del   N164S   Y170L  S172A   G181A  F186S   P209S   V213G   S194L  D215G   R214L  Q218H  1233V   G232C  T240I   A222V  L242P  L242V  L242del   H245Y   A243S  L244del  W258L   G261D  A262S   A263V  A557S D377Y 593 S1 B -S1 C linker S1 C -S1 D linker S1 C domain S1 A -S1 B linker  second waves of COVID-19 infections in India. The inferences of this study will be helpful in understanding the evolutionary trajectory of the virus and designing and supporting spike protein-based public health response.
The mutations were analyzed on the basis of their presence or absence in different regions of spike protein. The mutation density was calculated as the number of mutations observed in a particular spike protein region divided by its size in base pairs, while frequency of a particular mutation was calculated as the number of samples showing a specific mutation divided by the total number of samples analyzed [13]. Three-dimensional crystal structure of spike protein available in the protein data bank (PDB; code: 6LZG) [14] was used to map the mutations located in the RBD. Mutation analysis of the prevalent SARS-CoV-2 lineages in ten Indian states Distribution of mutations in the spike protein A total of 242 mutations corresponding to 207 sites in the spike protein were recorded during the assessment period (Table 1 & Figure 2). D614G was the only mutation present in all analyzed spike proteins between March 2020 to August 2021 from all the ten states. While, other prevalent mutations like N501Y and T716I (present in seven states), H69del, V70del, Y144del, N440K, P681H (in six states), D215G, K417N, E484K, Q671H and A701V (in five states), L5F, L18F, D80A, Q613H, Q675H (in four states) and Q52R, T95I, W152L, E154K, V382L, L452R, E484Q, A520S, F888L and D1135Y (in three states) showed differential distribution (Table 2 & Figure 2). There were 36 mutations that were present in 2 states each and the remaining mutations were represented individually in various states (Table 2 & Figure 2).

Distribution of mutations among domains & subdomains of spike protein
The observed 242 mutations were found scattered through all the domains and subdomains of the spike protein except the S1 B -S1 C and S1 C -S1 D linkers, and the fusion peptide domains wherein no mutations were found (Table 1 & Figure 2). Mutations that were observed in just a single sample were termed as unique. The highest number of mutations (105) with 43.4% occurred in S1 A domain with 81 mutations listed as unique. It was followed by 35 mutations found in S1 B domain (14.5%) with 21 unique mutations and 15 mutations in β-sheet domain (6.9%) with 13 unique mutations (Table 1 & Figure 2).

Frequent spike protein mutations
As many as 170 out of 242 observed mutations were found to be unique. 28 mutations that appeared in more than 10 samples each were listed as frequent mutations. These frequent mutations with number of their occurrences and corresponding mutation frequency given in a decreasing order in parenthesis included: D614G (697; 0.  Table 1). The remaining mutations showed anywhere between two to ten occurrences (Tables 1 & 2). The mutations showing five to nine occurrences are represented as bold, while 10 and above occurrences as bold and underlined in Table 2.

Mutation density in different regions of spike protein
The region-wise mutation density in the human SARS-CoV-2 spike protein is depicted in Figure 3. The highest mutation density (0.55) was observed in protease cleavage site, followed by 0.35 in S1-S2 subunit linker region, 0.34 in S1 A domain, 0.24 each in cytoplasmic region and β-sheet domain. The S1 B domain that forms the receptor binding domain and the transmembrane region showed a moderate mutation density of 0.17 each in the analyzed samples. Table 2. Distribution of mutation types and sites in SARS-CoV-2 spike proteins in Indian states.  (1) 10 Occurrences are indicated in parenthesis; 5 to 9 occurrences represented in bold; 10 and above are represented as bold and underlined.

Spike protein sites with multiple mutation types
We observed 34 sites in the analyzed spike proteins that harbored two or three different mutations at the same position in the spike protein sequence. Two sites 242 and 484 displayed three mutations each. At the former site, the amino acid leucine (L) was either deleted or changed to amino acid valine (V) or proline (P), while in the latter, glutamic acid (E) changed to either glutamine (Q), lysine (K) or aspartic acid (D). At the remaining 32 sites, two changes each were observed (Figure 4).   21  27  52  69  70  75  77  80  132  138  144  153  156  172  181  186  242  243  261  263  477  484  522  570  583  677  681  688  701  936  943  950 1073 1187

Spike proteins with maximum mutations
Seven hundred thirteen Indian human SARS-CoV-2 spike proteins carried from one to thirteen mutations. The highest number of mutations (13)

Mapping of observed mutations in RBD of SARS-CoV-2 spike protein
The RBD of spike protein interacts with host ACE2 receptor to mediate viral entry. The RBD region of spike proteins from ten different states revealed 35 Table 2). The state-wise mutation types and sites of human SARS-CoV-2 spike protein in the RBD according to geographical locations is included in Table 2

Samples without D614G
Although mutation D614G was ubiquitously registered in samples from all the ten states, there were 16 samples from five states viz. which did not possess the most widely occurring mutation D614G. These samples were visible between March 2020 to July 2020 (Table 1). All the samples in remaining five states, and samples from aforementioned states after July 2020 contained D614G mutation either singly or in combination with other mutations.

Preferential site specific positive selection pattern for spike amino acid substitutions
Interestingly, we observed a preferential site specific positive selection pattern for particular amino acid substitutions in the spike protein. For instance, there were 11 sites (95, 240, 286, 307, 430, 547, 572, 678, 716, 859 and 1009), where amino acid threonine (T) always preferentially changed to isoleucine (I; 23 occurrences) across all the analyzed samples. T95I and T716I of these were the most frequent mutation types and sites. Only at three sites viz. 19 (Chhattisgarh), 478 (Chhattisgarh) and 1116 (Gujarat), T was observed to change to arginine (R), K and alanine (A), respectively (Table 2). Likewise, amino acid Q changed to histidine (H) at seven sites (14, 52, 218, 613, 675, 677 and 1071) in 13 occurrences. Q677H was the most frequent mutation type and site followed by Q613H. Except for one instance of Q677R mutation (karnataka), Q always changed to H at the 677 position across the states. Another preferential substitution was amino acid methionine (M) to I, observed at four sites (731, 1050, 1229 and 1237) with M731I being the most frequent mutation type and site in three occurrences. Phenyl alanine (F) to I (456 and 464) and Q to R (52 and 677) substitutions were observed at the mentioned two sites each ( Table 2).

Gujarat (G)
A total of 96 mutations were observed in 314 samples analyzed from Gujarat state ( Table 2) Table 3. Ten spike proteins of B.1.1.7 lineage showed 10 signature mutations only, while the remaining five showed convergent evolution. In GISAID accessions EPI ISL 825054 (eight mutations) and EPI ISL 825059 (10 mutations), H69del and V70del went missing in the former, while, H69del was replaced by H69N and V70del was replaced by 168del in the latter. In accessions EPI ISL 1544115, EPI ISL 1544116, and EPI ISL 1544117, highest 11 mutations each were recorded. While S94F was added as 11th mutation to the set of 10 signature mutations in the first two, T1116A was added to the third, respectively. H69N, S94F and 168del were the three newly evolved mutations in this lineage in the state, which were absent in the signature mutations that defined the lineage.
Lineage B.1.351 was characterized by eight signature mutations (Table 3). In all the analyzed samples in the Gujarat, seven out of eight mutations excluding del241/243 were observed. It was replaced by L18F in GISAID accession EPI ISL 1704309 and EPI ISL 1703836. In the latter spike protein, an additional F855V mutation was also noted taking the number of maximum mutations recorded to nine for this lineage. L18F and F855V were the novel mutations that arose in the state in this lineage.
In the lineage B.1.525 in Gujarat, the characteristic signature mutation del144/145 was not observed in the analyzed samples, rather a new mutation arose as R365I. Q52V of the signature mutation changed to Q52R (Table 3).
After December 2020, in the second wave of infections, more transmissible and virulent lineages B.  (Table 3), was not been observed in the state. Additionally, many new substitutions were observed at various sites. E154K and T95I were the most frequent combination visible in 15 of the 19 samples until February 2021. After which, yet another new mutation H1101D started becoming visible. Interestingly, T95I was not seen after February and never in combination with H1101D in our samples. The total number of mutations ranged from six to eight in this lineage. Four of the signature mutations L452R, E484Q, D614G and P681R were observed in all the samples. Additional mutations included V3G, R21T, L141F, G142D, E154K, Q218H, T307I, E583Q, S698L, V1060L, D1153Y, N1187H and V1264L (Table 3, Figures 7 & 8).
In the lineage B.1.1.7, nine samples were analyzed. Except del 69/70 that was observed in two analyzed samples remaining seven signature mutations were present in all the samples. The total number of mutations ranged from seven to ten additional mutations included L84I, Y144del and Y145H (Table 3).
Three Seven signature mutations except del69/70 and del144/145 were visible in the state, though not in every sample. Sample EPI ISL 1716813 contained seven signature mutations and an additional new T572I mutation. In the remaining samples, for instance, four, three and two samples contained seven, six and two of the signature mutations, respectively, while three, four, and five mutations were represented in one sample each only (Tables 2 & 3).
In lineage B.1.525 samples, number of mutations recorded ranged from four to five. Five of the signature mutations analyzed were A67V, D614G, E484K, Q677H and F888L. del69/70 and del144/145 were not observed (Table 3). At 52 spike protein sequence site, V (Valine) of the signature mutation was substituted by R (Arginine). Q52R was hence a new substitution (Table 3 & Figure 7). Total number of reported mutations was nine for the South African lineage B.1.351. Seven signature mutations except del241/243 were visible in the analyzed samples. Additionally new substitutions G181A and S943P were also observed (Table 3 & Figure 7).

Rajasthan (R)
Twelve mutations were detected in 16 samples and five mutations were unique. Clade O was found in four samples. D614G was the most common mutation found in 12 samples. N440K was the only mutation located in RBD of the spike protein (Table 2).

Haryana (H)
Three mutations observed in 18 samples with clade O represented in three samples, D614G was the dominant mutation detected in 15 samples. P330S was the only mutation located in RBD of the spike protein (Table 2).

Uttar Pradesh (UP)
The analyzed 20 samples from the state revealed seven mutations. There were six unique mutations and clade O was detected in five samples and D614G was the most frequent mutation that was present in all the remaining samples. A522S was the only mutation located in RBD of the spike protein (Table 2).

Discussion
As the SARS-CoV-2-induced pandemic expanded, research efforts around the world started focusing on wholegenome sequencing, understanding epidemiology, documenting genetic diversity, analyzing the evolving mutations and their effects, monitoring the genomic evolution of the virus and developing diagnostics, therapeutics and vaccines. Detailed mutation analysis of human SARS-CoV-2 spike proteins sampled between March 2020 to August 2021, involving both the first and second waves of COVID-19 infections from ten different Indian states in the present analysis revealed both a rise in number as well as the induction of many novel functionally significant mutations in the SARS-CoV-2 spike protein in India. Introduction of global lineages, the emergence of local variant strains and induction of new mutations in the existing strains with the passage of time and laxity of COIVD-19 appropriate behavior contributed to this upsurge. The number of spike mutations in various states varied from one to five during the first wave of COVID-19 infections, while a sudden spurt in the frequency of spike protein mutation took the maximum number to as high as thirteen per spike protein during the second wave of infections.
Since its emergence in 2019, the SARS-CoV-2 virus has been continuously evolving and has resulted in many novel lineages that emerged in different parts of the world. During the first wave of infections in India until September 2020, lineage B.1 with D614G mutation was visible in the analyzed samples. Later, between October and December 2020, lineage B.  [15], the Delta variant showed higher transmissibility and risk of hospitalization [16][17][18] and fuelled the massive devastation associated with the second wave of COVID-19 infections in India.
The further evolution in the prevalent global and local lineages in the analyzed samples was seen to be propelled by the induction of additional novel mutations which were visible both in terms of increased number and their diverse types. The maximum number of 13 mutations per spike protein was documented in the state of Maharashtra which battled a brutal and long phase of COVID-19 infections. From January to August 2021 (second wave) a total of 51 new mutations besides the listed signature mutations evolved in eight prevalent lineages in 763 samples from ten Indian states included in the present study. The emergence of many new mutations in addition to the signature mutations of specific lineages indicates a convergent evolution in the virus that might be responsible for an increase in transmission and pathogenesis. The selection pressure resulting from the rollout of the vaccination program in January 2021 along with other factors might have triggered the emergence of novel spike mutations in India.
Several studies have documented the role of naturally induced mutations in RBD and other regions of spike protein in viral entry, transmission, infectivity, pathogenesis and immune escape [7,19,20,21]. VOCs Alpha and Delta showed higher transmission rate and spread globally and VOCs Alpha and Beta were discovered to be resistant to neutralizing antibodies, thereby, affecting the effectiveness of vaccines [22].
In the context of spike RBD mutations, Delta sub-lineages characteristically contain two mutations each in the spike RBD and are commonly known as double mutants. B.1.617.1 (Kappa) and B.1.617.3 contain L452R and E484Q mutations each, while B.1.617.2 (Delta) has L452R and T478K as the RBD mutations [24]. The L452R mutation is associated with increased infectivity by enhanced interaction between RBD and hACE2 receptor [25]. Co-operatively, T478K and T452R mutations stabilize the RBD-ACE2 complex to increase the rate of virus infectivity and affect the immune response [20]. Another important spike mutation P681R leads to increased furin cleavage leading to greater infectivity and higher viral loads. Together, this cocktail of B.1.617.2 mutations imparts higher transmissibility, infectivity and immune evasion potential [20,[25][26][27].
In the present samples, D614G has been the most widely occurring mutation, present in 697 out of the 763 analyzed viruses. It has been shown to result in significantly higher transmission and host infectivity [28]. The other frequent mutations observed were N501Y, P681H, A570D and D1118H observed in 77, 70, 60 and 59 of the analyzed 763 viruses, respectively. With respect to the state wise distribution of mutations, five most frequent spike mutations sites with occurrences in various states included 501 (G, M, K, D, CH, T and MP), 484 (G, M, K, D, CH and T), 440 (G, M, K, D, CH and T), 452 (M, K, and CH) and 417 (G, M, K, D and T), respectively. Interestingly, we observed state-specific unique set of mutations in six out of ten analyzed states that showed up as A344T, R346T, R365I, N439K, Q493stop, S494P and Y508H in Gujarat; S373L and L517H in Maharashtra; V367F, D377Y and A522V in Delhi; P384L and E484D in Telangana; S399stop, F456I, P463H, F464I, E465K, R466K, S469T and A475G in Madhya Pradesh and A522S in Uttar Pradesh. The differential level of infections, pathogenicity and transmission in different states could possibly be due to these state-specific mutations. Functional characterization of many of the presently observed mutations has been illustrated in many studies [20,21,[27][28][29][30][31][32]. Most of the RBD mutations strengthen the RBD-ACE2 binding supporting the evolution of the virus to more infectious variants [21,27,29]. Mutations N501Y, D614G and others have been found to be associated with reinfection, partial resistance to vaccines and increased transmissibility. Mutation V367F, N354D and T478K in SARS-CoV-2 in RBD has been associated with enhanced hACE2 binding affinity and increased viral infectivity [21,27,29]. Mutations L452R and E484Q found in Indian variants have been shown to disrupt the binding between RBD and many known antibodies leading to vaccine escape [21]. Cherian et al. 2021 [20] attributed the higher pathogenicity, transmission and acute infections of SARS-CoV-2 in the state of Maharashtra to the presence of spike mutations L452R, T478K, E484Q and P681R. Further, Wang et al. [21] identified R403K, K417N/T, L452R, A475S, E484K, F486L, F490S/L, Q493L and S494P as most likely vaccine escape mutations. The S1/S2 cleavage site mutations H655Y, N679K and P681H result in increased S1/S2 furin cleavage and facilitate efficient entry into the host [28,[30][31][32]. Many spike mutations have been shown to affect the neutralization ability of monoclonal and polyclonal therapeutic and convalescent antibodies. While mutation N439K resulting in increased binding affinity with ACE2 receptor was observed to neutralize the monoclonal and polyclonal antibodies in people who recovered from infection [31,33], deletion mutation Y144del was observed to modulate the effects of neutralizing antibodies [30,34]. Likewise, mutations K417N/T, N439K, L452R, Y453F and N501Y have been listed to be the most significant immune escape spike RBD mutations [35]. Further, multiple spike substitutions at 477 (S477G, S477N and S477R) and 484 positions (E484A, E484D and E484K) have characteristically shown resistance toward convalescent sera [28,31,35].
Interestingly, amidst the presence of so many mutant and more transmissible and virulent lineages, clade O (Wuhan isolate) was visible in India throughout the assessment period from March 2020 even until 2021. As many as 50 out of 763 analyzed spike proteins were observed representing clade O with no mutations observed in them relative to the human SARS-CoV-2 spike protein reference sequence from Wuhan, China. These samples were present in all the analyzed states except Chhattisgarh. A maximum of nine samples each were present in Gujarat and Delhi, while Haryana had the minimum number of three. Also, the initial lineage D614G co-existed in the population and remained visible throughout.
One of the most significant revelations of the present analysis has been the identification of spike protein regions S1 B -S1 C and S1 C -S1 D linkers and fusion peptide domains with no mutations. This information is important amidst reports of a significant reduction in vaccine efficacy and antibody neutralization against the spike proteinbased vaccines or monoclonal therapeutic antibodies [30,31,36]. In this context, the above-mentioned conserved spike regions can be important potential targets for developing future diagnostics, therapeutics or vaccines.

Conclusion
The analysis provides useful insights into the distribution of accrued mutations across the spike protein, their density in different regions of spike protein, the frequency and state-wise occurrence(s) of particular mutations, identification of mutation-rich, mutation-poor and mutation-nil regions in the spike protein and most importantly, comparison of spike protein mutations during first and second waves of infections in India to understand the evolutionary trajectory of SARS-CoV-2 in India. The spike protein now has an expanded mutational landscape and its evolution seemingly has facilitated the transmission of the virus by modulating ACE2 receptor binding affinity or immune evasion leading to reduced efficacy of vaccines and therapeutic antibodies. The continuous monitoring of emerging variants through mutation analysis and functional characterization of the induced mutations is important to track the further evolutionary course of the virus and to understand the effects of mutations on viral epidemiology. This information will eventually help in devising intervention strategies. In the context of the characterized spike RBD mutations favoring increased transmission, pathogenesis, immune evasion and reduced neutralization, it is probably time to possibly look at the more conserved parts of the spike protein or other parts of the virus for designing the next-generation therapeutics and vaccines.

Summary points
• The present study tracks the evolutionary trajectory of SARS-CoV-2 spike protein during the first and second waves of COVID-19 in India. • A detailed mutation analysis of spike protein of 763 virus samples taken from Global Initiative on Sharing All Influenza Data belonging to 10 Indian states between March 2020 to August 2021 revealed the presence of 242 mutations corresponding to 207 spike sites. The mutations were observed to be differentially distributed across states and domains of spike protein.
• The highest number of mutations occurred in S1 A domain. Interestingly, no mutations were detected in S1 B -S1 C and S1 C -S1 D linkers and fusion peptide domains. • The number of detected mutations rose from five per spike protein during the first COVID-19 wave to thirteen during the devastating second wave of COVID-19 infections in India. • The five most frequent mutations included D614G, N501Y, P681H, A570D and D1118H.
• 51 novel mutations emerged in the circulating lineages during the assessment period in 10 Indian states. Many of the incurred mutations have previously been functionally characterized to show immune evasion and resistance to antibody neutralization. • The evolution of spike protein in analyzed Indian states seems to be fuelled by introduction of global lineages, emergence of local lineages (Delta and its sub-lineages) and induction of novel co-occurring mutations. • The S1 A domain that includes the N-terminal domain, receptor binding domain and receptor binding motif is heavily mutated. This is a matter of concern as these three important regions of spike protein have been the primary targets of vaccines and antibody-based therapeutics including monoclonal antibodies (mAbs), polyclonal antibodies and convalescent plasma. • Many of the incurred mutations have been previously characterized to show immune escape and resistance to neutralization by antibodies. In this context, importantly, the study has identified mutation rich and no mutation regions in spike protein. These conserved spike regions can be useful for designing future diagnostics, vaccines and therapeutics.